Search CORE

28 research outputs found

Improved Diversity Maximization Algorithms for Matching and Pseudoforest

Author: Mahabadi Sepideh
Narayanan Shyam
Publication venue
Publication date: 09/07/2023
Field of study

In this work we consider the diversity maximization problem, where given a data set

X

n

elements, and a parameter

k

, the goal is to pick a subset of

X

of size

k

maximizing a certain diversity measure. [CH01] defined a variety of diversity measures based on pairwise distances between the points. A constant factor approximation algorithm was known for all those diversity measures except ``remote-matching'', where only an

O(\log k)

approximation was known. In this work we present an

O(1)

approximation for this remaining notion. Further, we consider these notions from the perpective of composable coresets. [IMMM14] provided composable coresets with a constant factor approximation for all but ``remote-pseudoforest'' and ``remote-matching'', which again they only obtained a

O(\log k)

approximation. Here we also close the gap up to constants and present a constant factor composable coreset algorithm for these two notions. For remote-matching, our coreset has size only

O(k)

, and for remote-pseudoforest, our coreset has size

O(k^{1+\varepsilon})

for any

\varepsilon > 0

, for an

O(1/\varepsilon)

-approximate coreset.Comment: 27 pages, 1 table. Accepted to APPROX, 202

arXiv.org e-Print Archive

Improved Diversity Maximization Algorithms for Matching and Pseudoforest

Author: Mahabadi Sepideh
Narayanan Shyam
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Towards Tight Bounds for the Streaming Set Cover Problem

Author: Har-Peled Sariel
Indyk Piotr
Mahabadi Sepideh
Vakilian Ali
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/05/2016
Field of study

We consider the classic Set Cover problem in the data stream model. For

n

elements and

m

sets (

m\geq n

) we give a

O(1/\delta)

-pass algorithm with a strongly sub-linear

\tilde{O}(mn^{\delta})

space and logarithmic approximation factor. This yields a significant improvement over the earlier algorithm of Demaine et al. [DIMV14] that uses exponentially larger number of passes. We complement this result by showing that the tradeoff between the number of passes and space exhibited by our algorithm is tight, at least when the approximation factor is equal to

1

. Specifically, we show that any algorithm that computes set cover exactly using

({1 \over 2\delta}-1)

passes must use

\tilde{\Omega}(mn^{\delta})

space in the regime of

m=O(n)

. Furthermore, we consider the problem in the geometric setting where the elements are points in

\mathbb{R}^2

and sets are either discs, axis-parallel rectangles, or fat triangles in the plane, and show that our algorithm (with a slight modification) uses the optimal

\tilde{O}(n)

space to find a logarithmic approximation in

O(1/\delta)

passes. Finally, we show that any randomized one-pass algorithm that distinguishes between covers of size 2 and 3 must use a linear (i.e.,

\Omega(mn)

) amount of space. This is the first result showing that a randomized, approximate algorithm cannot achieve a space bound that is sublinear in the input size. This indicates that using multiple passes might be necessary in order to achieve sub-linear space bounds for this problem while guaranteeing small approximation factors.Comment: A preliminary version of this paper is to appear in PODS 201

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

Approximate nearest neighbor and its many variants

Author: Mahabadi Sepideh
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 53-55).This thesis investigates two variants of the approximate nearest neighbor problem. First, motivated by the recent research on diversity-aware search, we investigate the k-diverse near neighbor reporting problem. The problem is defined as follows: given a query point q, report the maximum diversity set S of k points in the ball of radius r around q. The diversity of a set S is measured by the minimum distance between any pair of points in S (the higher, the better). We present two approximation algorithms for the case where the points live in a d-dimensional Hamming space. Our algorithms guarantee query times that are sub-linear in n and only polynomial in the diversity parameter k, as well as the dimension d. For low values of k, our algorithms achieve sub-linear query times even if the number of points within distance r from a query q is linear in n. To the best of our knowledge, these are the first known algorithms of this type that offer provable guarantees. In the other variant, we consider the approximate line near neighbor (LNN) problem. Here, the database consists of a set of lines instead of points but the query is still a point. Let L be a set of n lines in the d dimensional euclidean space Rd. The goal is to preprocess the set of lines so that we can answer the Line Near Neighbor (LNN) queries in sub-linear time. That is, given the query point ... we want to report a line ... (if there is any), such that ... for some threshold value r, where ... is the euclidean distance between them. We start by illustrating the solution to the problem in the case where there are only two lines in the database and present a data structure in this case. Then we show a recursive algorithm that merges these data structures and solve the problem for the general case of n lines. The algorithm has polynomial space and performs only a logarithmic number of calls to the approximate nearest neighbor subproblem.by Sepideh Mahabadi.S.M

CiteSeerX

DSpace@MIT

Approximate Sparse Linear Regression

Author: Har-Peled Sariel
Indyk Piotr
Mahabadi Sepideh
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Publication date: 01/01/2018
Field of study

In the Sparse Linear Regression (SLR) problem, given a d x n matrix M and a d-dimensional query q, the goal is to compute a k-sparse n-dimensional vector tau such that the error ||M tau - q|| is minimized. This problem is equivalent to the following geometric problem: given a set P of n points and a query point q in d dimensions, find the closest k-dimensional subspace to q, that is spanned by a subset of k points in P. In this paper, we present data-structures/algorithms and conditional lower bounds for several variants of this problem (such as finding the closest induced k dimensional flat/simplex instead of a subspace). In particular, we present approximation algorithms for the online variants of the above problems with query time O~(n^{k-1}), which are of interest in the "low sparsity regime" where k is small, e.g., 2 or 3. For k=d, this matches, up to polylogarithmic factors, the lower bound that relies on the affinely degenerate conjecture (i.e., deciding if n points in R^d contains d+1 points contained in a hyperplane takes Omega(n^d) time). Moreover, our algorithms involve formulating and solving several geometric subproblems, which we believe to be of independent interest

arXiv.org e-Print Archive

DSpace@MIT

Dagstuhl Research Online Publication Server

Approximation Algorithms for Fair Range Clustering

Author: Hotegni Sèdjro S.
Mahabadi Sepideh
Vakilian Ali
Publication venue
Publication date: 22/06/2023
Field of study

This paper studies the fair range clustering problem in which the data points are from different demographic groups and the goal is to pick

k

centers with the minimum clustering cost such that each group is at least minimally represented in the centers set and no group dominates the centers set. More precisely, given a set of

n

points in a metric space

(P,d)

where each point belongs to one of the

\ell

different demographics (i.e.,

P = P_1 \uplus P_2 \uplus \cdots \uplus P_\ell

) and a set of

\ell

intervals

[\alpha_1, \beta_1], \cdots, [\alpha_\ell, \beta_\ell]

on desired number of centers from each group, the goal is to pick a set of

k

centers

C

with minimum

\ell_p

-clustering cost (i.e.,

(\sum_{v\in P} d(v,C)^p)^{1/p}

) such that for each group

i\in \ell

|C\cap P_i| \in [\alpha_i, \beta_i]

. In particular, the fair range

\ell_p

-clustering captures fair range

k

-center,

k

-median and

k

-means as its special cases. In this work, we provide efficient constant factor approximation algorithms for fair range

\ell_p

-clustering for all values of

p\in [1,\infty)

.Comment: ICML 202

arXiv.org e-Print Archive

Simultaneous nearest neighbor search

Author: Indyk Piotr
Kleinberg Robert
Mahabadi Sepideh
Yuan Yang
Publication venue: Dagstuhl Publishing
Publication date: 01/01/2016
Field of study

Motivated by applications in computer vision and databases, we introduce and study the Simultaneous Nearest Neighbor Search (SNN) problem. Given a set of data points, the goal of SNN is to design a data structure that, given a collection of queries, finds a collection of close points that are compatible with each other. Formally, we are given k query points Q=q_1,...,q_k, and a compatibility graph G with vertices in Q, and the goal is to return data points p_1,...,p_k that minimize (i) the weighted sum of the distances from q_i to p_i and (ii) the weighted sum, over all edges (i,j) in the compatibility graph G, of the distances between p_i and p_j. The problem has several applications in computer vision and databases, where one wants to return a set of *consistent* answers to multiple related queries. Furthermore, it generalizes several well-studied computational problems, including Nearest Neighbor Search, Aggregate Nearest Neighbor Search and the 0-extension problem. In this paper we propose and analyze the following general two-step method for designing efficient data structures for SNN. In the first step, for each query point q_i we find its (approximate) nearest neighbor point p'_i; this can be done efficiently using existing approximate nearest neighbor structures. In the second step, we solve an off-line optimization problem over sets q_1,...,q_k and p'_1,...,p'_k; this can be done efficiently given that k is much smaller than n. Even though p'_1,...,p'_k might not constitute the optimal answers to queries q_1,...,q_k, we show that, for the unweighted case, the resulting algorithm satisfies a O(log k/log log k)-approximation guarantee. Furthermore, we show that the approximation factor can be in fact reduced to a constant for compatibility graphs frequently occurring in practice, e.g., 2D grids, 3D grids or planar graphs. Finally, we validate our theoretical results by preliminary experiments. In particular, we show that the empirical approximation factor provided by the above approach is very close to 1

arXiv.org e-Print Archive

DSpace@MIT

Dagstuhl Research Online Publication Server